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Background : 



SInct th« Inctptlon of ESEA Titl« I In 196^. Tltl« I programs have b««n providing annual 
«vaiuation rapiorts to thair faspactlva statas/andlNi^tatas In turn tq tha #adaral govarnmtnt. 
Thjtf reports individually provided informatiort^about both tha program processes^and program 
otUcomes. Yet*, coil^tivtly the data from these individual raports could mot be aggragated.at 
ly level. ' • . ' 

t 

t * 

OLcour^ from a policy and decision-making level, both state and national, the questiorl 
prises "Is tU program working?" that is to. say "J^ the children learning more as a result of 
the program?" These questions can be extrapolate^to tha foltowing,budgetary one, "Is the 
dollar input pro-am aqpaling the dollar output of the program?'* which again is more 
succinctIV stated " Ara \^ getting the bang for the buck 7" Directly or indirectly all of these . 
questions, both pedagogical ahd llscat, gave rite to Congr^' plaa anddeniands of the United 
States Office pf Education to develop an evaluation and reporting system which would reflect 
a systematic attempt to formalize data colljiction and reporting practices across the states. 
The intent was to ensure that uniform, relevant and meaningful data would be available to 
educators at the local, $tate and national levels. ^ 



USOE's response to the Congressional mandate was the development of a system 
(Title 1 Evaluation and Reporting System-TIEte) which in essence is designedio collect and 
summarize data on jix topical areas covered by the Title'l program including: student 
participation, staffing, parental involvementf (PAC), in-service training, cost and studen't impact. 
The initial thrust of the system is on student outcome data in projects providing instructions 
in the areas of reading, language arti, and mathentwtics. 

The system which legally became effective in the fall of 1979 is comprised of three 
outcome evaluation models or designs: 

Model A: the norm-referenced model 

Model B: the cohtrol^roup model 

Model C: the Special regression model . 

All three models are designed to be used with any valid and reliable norm-referenced test or k 
criterion-referenced test. Additionally, each of th^ models requires both pretesting and post- ' 
testing and iniposes some special conditions afrid restriction on the testing itself. The three 
models each provide^data on an observed po« treatment performance measure and an estimate 
of what that performance would have beep without the program (I.e., without tVie triiatment). 

Impact gains are reported in what is termed a Normal Cunwi Equivalent (NCE) scale 
whicJiS^a 99-point scate tied to a distribution of test scofes of a r4ition-wide representative 
sample of students and matches the percentile ranks of that distribution at values of 1, 50 
and 99. 



As designed, TIERS begins data collection at the prqject(uniaue combination of persohnel, 
resources, methods and- activities that define a particular treatment) level or school level. With 
Modet A some pteliminary analysis of iry»pact date occurs at the school leveL Data are thin 
aggregated and analyzed at the LEA w|<h the resultant analysis reported to the state. The 
LEA then in turn aggregates- its data arid reports it to the federal government. 

'ss>i»; » . ' - 

The system as described, although having a number of rigorous technical and implemen- 
tation fulfi, is basically a decentralized valuation and reporting sys^am that peaks or pyrarnids 
in terms 6i the data wKich initiates at the project level In a local district later to be captured 
in a pammal snapshot of program impact. Ttie issue* which looms paramount is that of qaality 
controK That is to say how^good are the data which are being aggregated ^t all levels and hence 
hovY ykWd are the policy decisions wHjch can be derived from that 3ata, 

'I'd ' \ • 



mod*!: 



It \s u$«ful at this point to •xamin« a schematic of th# proj«ct«d Aggrtgat* rtporting 



Mod^irAl (non-comparability 
B of rasults d«p«nding 
C J on modtl used) 

grad* (2 through 12) 
by 

Subi«ct (r«ading,- language arts, 



mathematics) 



by 



project (project vectors: time 
of instruction; ' 
per /pupil costs; 
instructor to pupil 
ratio) 



school A LEA 

aggregate ' v aggregate 
(for some models) 



SEA A Federal 

.aggregate V 



Raw Data 



' Data Quality >, Dita Quality Data Quality Data Quality • 

Control Point \ \ Control Point ' Control Boint # Ccfntrol Point ^ 



FIGURE I ■ 
AGGREGATE REIpORTIN^ MODEL 
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'To date little to no work has been done at any level oh the issue of quality control and 
quality assur<ince of the data which comprise the element? of the reporting system. As cari be 
se«n,there are three major data quality Controf points to be dealt with in the Quality Control System, 
...2*1. reporting entity needing to grapple with the j^ijoblem at its various stages of complexil 



T 



The LEA must question the veracity of its data. 

The SEA must question the veracity of tb« data it receives from Its LEAs 
The Department of Education must quejrtion the,verac[ty of the data It , 
receives from the states. ' 



f 



^The reporting^hiodel is a cumnr)ulatlve model and herice If strict quality control procedures are not 
^put into action we will cummulate datji error to.the federalKPMi artd be in no better postion to sa^ 
anything meaningful on the output of Tit^ I programs than we were fble to say in 1974 (pre-TIERS). " 



Bri«f Ljt«ratu)'« R«vi«v/: 



CollMgu«$ ar« btginning to conctptualU* and daal with tha issuas of quality control of 
data and sourcas of data arror. A scan of tha lltaratura ralatad to what collaaguas tarm "Quality 
control" is' scant in tarms of documantation and I baliava focusad mora on implamantatibn 
arrors at tha local laval than'on thrfconcapt of statistical quality control which will t>« explanad 
latar. 

Crana and^(lls.(urxpublishad, 1979) 4nalyz«d soma 40 distrtet reports from lllionis 
Titia I districts which usad MojIalX-l. Tha purppsa of thair analysis. was: /I) to (Jetarmina tha 
typas of arror^ada and to datarmina-if possible tha *f facts on NCE tf^n aiimatas. (2) to 
provide information to design quality control procedures. Their findings indirtite high error rate* 
m critical areat qf technical implementation of the models as well as problems related to aggregate 
n'$. floor and ceiling effects and the effects of error oa NCE gain estimates. From their findings, 
they saa quality control procedures aJ critical to further reduce the irror rates that tha^ found in 
analyzing their sample data. The Crane and BolMs data base are extremely useful for coJnparison 
purposes. 

As recently as Jar^uary of 1980 Hiscox and Deck in an unpublishe<^ paper discuss a 
comprehensive approacK to the issue of jQuality Control of Table I data. Their approach 
correctly points out that since the TIERS is a layered reporting system-LEA to the State, 
and the State to the federal government— that quality control requires consideration and 
resolution at all of these levels. They^suggest a four stage model ta pinpoint error-«rror in 
planning, impleh^entation, analysis and reporting. The benefit if the work of Hiscox and Deck 
1$ m the classification schema 6f the potential errors by cat«gory and the suggested corrective 
action to be takwn respective to the impoHance of, the error type 

View on a Definition of Qciality Control 

\ ' 

It would seem to me that resolution of error in th^ various stages of LEA implementation 
of Jhe valuation and reporting system will be a l?old step forward.in establishment of a clean 
data base. None-the-le$$ at the stAte level, which is the lev^l concerning- the focus pf this paper, 
quality control takes on some added meanings. Given limitations of both time, personnel and 
resources most stat.es including New Jersey must rely on a set of pi'oiedures (call them quality 
control procedure?) to examine the.aggregrated data base which they receive from their LEAs. 
Very few states.have the sophistication of,in evaluation data audit function or the overall 
computer capability to manage thesunimative analysis of individual student data al the state 
level. Q*iality,c(yitrol is the issue which confronts us. As I define it quality control techniques 
are.su perimpos^d on a data set after reasonable attempts have beer^ made to satisfy the technical 
and imprelemnatiation concerns of the Title I Evaluation and Reporting -models. I don't view • 
debugging of problems in LEA planning, implementation analysis and, reporting in the same 
Hght as I do "statistical quality tot^trol." Statistical quality control as I would li^e to apply 
It to an educational setting in esservce is an industrial concept. Quite simply many of the 
techniques developed by mathematical statisticians for the analysis of data may be-used in the 
control of product quality. The»basid foundation of the statlstical'quality control model applied 
t9 industrial data are briefly outlined below. 

\ , 

\ 

Statistical Qu ality Control: Th^ Industrial Model I \ , • 

Statistical quality control should be viewed^as a kit of tools which may in1i|uenc> 
decisions which are related to the functions of specification, production or inspectlorCThere 
are four separate but related techniques that tonstitute the most com'mbh working statistical 
tools in quality control. These tools are: \ . 



The Sfieyvhart controtc harts for measureable quality^hara^teristics; 
Jh«s« are.da^cribecj as cfjarts for variables, or as charts for X and R^ 



er|c> 



(average and range) and charts for Yand ^(^iuMe and standard 
deviation). ( '^W 



J' 



2. Th« Sh«wh«rt control chart for fraction deftctivt (p chart). 

3. Tht Sh«whart control chart for numt)tr of dafictsptr unit (c chart). 

4. That portion of sampting thfory which deal^ with the quality 

; protection given by ar> specified sampling kceptance procedure. 

As explained t>y W. A. Shewhart (1939. pg. 49); "Measured quality of manufactured . 
product is aJw^ys subject to a certain amount of vaHation as a result of chance. Some stable 
'system of chance causes' is inherent in any particular scheme of production anpl inspection. / 
Variation within this statue pattern is inevitable. The reasons for variation outside this stable 
pattern may be discovered and corrected." U is quite clear that the power of the Shewhart 
technique lies in its ability to separate out these assignable causes of quality variation. This 
is done ^st hoc through aiS examination of the outlyert whlcf) fall outside of the pre-estab- 
lished upper and lower quality control lim'lts (howevel wide or narrow these tolerances are 
set). y 

Not only is the control chart a 'powerful technique for iodi^strial applications but can , 
also be applied to the educational setting as it specifically relates to the Title I data base. 

Statistical Quality Control: The Industrial Model Applied 

* ■ 

The concept basically is one of assuming that the range of oatput performance varies * 
within an ''acceptable range*' (upper control limit and lower control limit) graphically 
reprsent(ld as follows: 

X ' X ' (upper X) 

upper control limit 





X X 








X 


X 


rr-^ 


X 


X 







average (X) performance 

lower control limit 
. ^ . " ^ ^ ' > (lower X) 

Performance of course can and does fall outside of the contrbl limits'and hence outlyer 
data points can be noted. Therefore, a control chart can provide the following types of information: 

1. Basid variability ofHhe^Jerformance characteristics; 

^ 2. Consistency of performance: and , 

3. Average level of performance. 

■ . ^ . \ . . . ■ • 

The upper and lower control limits on any control chert can be established, by examiping 
the empirical data base and determining hoyy much tolerance or variability one wrshesto tolerate 
in the system.' Data f oljing outside of the upper and Idwer control limit can then be,e>iamined as 
to why those data are stowing up there. . - 

Eg. If data falls above the upper control limit . , v * ' 

this may slgni^l (trigger) an examplary pro- 
gram for further review. (One of the estdb^ 
llshed purposes of TIERS.) - ' 



If dat< falls below the lo^er control limit , 
this may signal (trigger) problems with ' 
,v .a programK>r intervening variable which 

. require further revieW. . , ^ 

9 
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N«w JTsty Statt DtDartmtntof Education's Propostd Quality Contrct) and Quality Assurance 
Systenf^i • 

Th« N«>%J«rs«y Stat* Opattment of Education is planning^to implamant a 3-pha$« over- 
all quality a$sur#ic« and quality control approach for its FY'80 compensatory data base (Title 
I, State Compensatory E^tucation, and those qoi^pensatory programs funded by other sources) 
as follows: . ' ' t 

Phase I: All New Jersey Basic Slolls Preventive and Remedial Proqrams must 

satisfy the technical and implementation requirerVitfnt^ ortheevalua^ . 
tion and reporting modSls. ^ . 

Additional materials and training are being made available to ensure 
a iufftcient knowledge base at the t-EA level.' Initial coTnplt'ai^ce 
with the technical implementation. and programatic implementation 
constructs deli(>eaftd below will be determined through county 
office and SEA program monitors. 



Interpre^ino^the educational impact gai^ns demonstrated through implementation 
Model A , /r, C , and C , requires consideration of technical and programmatic 



^educati< 

of Model A^'a'^.cI, and C^requi ^ „ 

characteristics before sound decision-making and program planning can occur. In 
review of LEA data, the following issues must b€t considered: 

t 

L LEA districts programs that show negative or no impact 
gain for a grade or subjecjL should be reviewed^JTo assu|;/ 
gain is not reflecting just tpaccui'ate or improper ^mple- 
mentation/the following ^reas should be reviewed in, detail. 

Technical Implementation Criteria: 



1, 



Test admjnistration oc^curred at or near norming 
date for pre an(l post evaluation. If administration 
occurrefl nnrore than two weeks on either side of 
the normirig date, accur^e interpolation procedures 
were implemented. * 

District mean was aggi;egated by:. 

a. adding all students and tb.en 
dividing for district mean.. 



b. weighting class and building 
lans to obtain LEA mean. 



\ 
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4. 
S. 

6. 

.7. 

8.; 



Data wM^aggrjsgated on stu<;|ents who had both 
pre and po^ -test scores. " 

Out-of-levei conversions were implemented properly. 

Floor or ceiling effects did ndt occur at pre or post 
testing periods. 

Conversion^ to NCE scores were accurate. 
Model fi^, only. Correlation coefficients 



S' 



were higher than .6^ 

Model A^?'A^ - StCidervts were not selected on the 
-5- 8 



pre-te!rt. 



\ 



9. Mo'd«l /G^ , A strict cutoff score v«^s utilfzed. 

10. CRT us«d for' Model A? or C^- had demonstrated 
4^ , validity and reliability. , 

; ■ • r 

1 1. Appropriate level and some form was administered 
^at pre Jbnd post time. 



Programmatic Implementation Criteria 

1. Tests administered for evaluation reflected the ^ . 
curriculum of the ccrmpensatory project. At 
:leist 75% of the items of the instrument 
measured skills taught in the compensatory 
project, (content validity) 

2. Data was aggregated only on students who 

fulfilled the following criteria: 

■, • • * 

Participated in a program mor« than 
four months. _ 

« 

b. Program was fi^ify operating for the 
period between pre and post testir>g. 

c. Studenfs attended the program at 
least 2/3 of the time, " ^ 

LEA district programs that show impact gain above 20 NCE-potnts for 
^ny grade or subject should be reviewed. To assure gain is not reflecting 
Inaccurate or improper implementation, all items in Item f should be 
r«/lewed plus: 

\ ' ' 

\ 

1. \ Tests administered to the students has not been utilized 
ynrjore than twice Jor any student. 

2, test instrunnents were two or more grades below 
thy grac^ the student is currently enrolled, 

1. Attend^ince; attitudinal. scales, or other indicators df 
► student improvement should be available to supfJort 

test findin9$. 

2. Aggregation -of non-test data must b#implem*nted 
as fqlio^: " 

I- " " ■ 

a. Developmental Project: Indicators shoUld 

bf only aggregated on students who have been , 
the developmental project more than two years. 

» 

b, ' Compensatory Pr9fect: Indicators should be 

only on students in the project more *lhaa,f our - 
months. . ' * . ' • 



Non-Testvata 
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P>^»«» 1^- This phast is conductad at tha stata laval and consist* of praliminary 
computar adit r.outinas run on tha aggragatad data racaivad from tha 
LE^. Tha computar adit routinas will ba dslgnad to ki<?k out data 
for tha following raasons: 

1.. Tasting not conductad at tha norming data 

2. ^ Inappropriata fornr> and lavel of tha t4bt usad 

^or pra-tasting and post -tasting 

f 

3. NCE gains which ajai too high or too low 

(this naads to brdafina3). For dlKusilon , * 

saka tha adit routina might ba dasignad to 

bounca out lats say an NCE gain of 45 and ' 

onaof-10). , 

f 

4 

4. Errors of cpnvarsion for parcantile to NCE. 



etc. 

Phase lll/ Thls phasa consists of the application of -statistical quality cnnfml 
procedures using t^ie Shawhart control chart. The upper control 
limits and lower control limits will be established by the state for 
the achievement data base., . ^ « 

For tha stag's review purposes, only 6% of the data falling outside of the control 
limits <2\^Vaboya. 2\^% below) will ba Examined. 

As was mationed earlier, data falling outside tha 
upper control limit may signal (trigger) an exemplary 
program for further review just as data" falling out- \ 
side the lowijMT control limit may signal (trigger) 
. ^ ■ *problems.with a program or soma other intervening 

) ' varlable(s) which reqOire further review. 

An Application of Phase III Quality Control ' ^ 

* S ■ 

E^ich year New Jersey districts via the State's reporting structure provic^e 
V . ann^al. reports delineating mean student achievement; scores as well as pre 

«"cl post achfavament test scores by grade for students in compensatory 
programs (Title I, Stata Compensatory Ecfucation, locally-funded programs, , 
etc.) 

Over the past two years, the.Offica of Evaluation has analyzed di,$trict • 
scored to dat«frmine program effectiveness for compensatory education "-^ 
(Title I, State Compensatory Education, locally-funded progr|ims, etc.) 
populations. '« ^ . - ■ ^ 

An«ysas are conducted in Normal Curve Equivalence (NCE) scoras. This " 
typa^ofr inablas computatioijal procedures that, could not be conducted with 
percentiles or grada equivalents. The scores offer the advantage of estimating 
, ^ the relative performance of children basM on performance of their pears. As 
a result, no program expectatlorts for post test achievement cari be made based 
T Offthe scores of children oh the pre tests, with reference to tha scoras of th||l> 
- . • ^ peers. ^ 

ERIC ■ . Y-'- . .. • -7- ; , 



Ovtr th« p«$\two y«ar, almost ai| Ntw Jarsay districts hava d«monstrat«cl 
achiavamant gains in tha compansatory aducatipn populations abova tha akpactad 
gains. BAsad on iha distribution of gaini.mada by cllstricts irvaach grada laval. a 
pfojaction nr»ay ba mada of which districts ara gaining most ar\d which ara gaining 
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Also basad on tha distribution of gains.,stati$ticil quality control critaria 
may ba utilixad to datarmina both tha positiva and nagativa outlying districts. 
Bacausalhasa gains vary so much by grada laval. saparata computations should 
ba mada for aach grada. Tha datf from aach yaaf , collactad in lata summar, 
should ba usad to astatjlish thasa critaria for achiavamant parformanca for 
tha following yaar. * *, 

— Ofsparstoi> of maah pra to post tast diffarancas ara analyzad by maans- 
of tha standard arror of diffarancas. At aach'grada. tha maan gains of 95% 
of all districts windfall bafwaan + or - 1.96 timas tha standard arror of^ 
diffarancas (Sp) from tba maan gaiaat that grada laval. Fof axampla. If all 
districtjLavaraga^ a gain of 11 NCEs in Computation at Grada 9 and tha 
standard arror of diffarancas was 2 (Srj " 2). than §5% of all districts gained 
batwaan 7.08 NCEs and 14.9 NCEs. By formula, this was computed as ' 
follovVs; 



D?) - 0)2 
N (N-1) 




• In this way; tha state data for each grade may ba analya^^d aacK)i|S^o 
determiWa4het6p 2.5%4iiid the bottom 2.S% otdistricts ln:achiaviiii«fe:giin^^^^ 
at each gradl^val. The programs in thasa dtstiicts can then be 0tarnin^ ,tp ' 
determine whyHhay fill outside of tha upper or lower control lirnSh ^ 

An exvmple utilizing the concept of statistical quality control foilaws) 
The data are basad on achievement sdores for New Jersey studants receivirtg • 
^compensatory progranrv. for tha Fy79, Tha data did not undar^ Phase. I qfr 
tha proposed quality control 'procfdura but did undergo Phase 1 1 of tha 
procedures (computer edit routines ^vere run). Thasa data ara provided as 
an illustrativ e example of the Quality Control model proposed and should 
not ba usad to make an y statements about New Jarsev compensatory ^ 
programs. . ~ . , 

N Example I 
' State Summary 
Computation Programs (Title I, State 
Compensatory Programs, locall^^ funded programs, etc.) 

Lower Control Uppar Conttol 
' 1.96)tS|5 Umrt(NC^) Limit (NCE) 



/?Mean' 

Gain (NCE). 


Grada 


17.811 


1 


1^.366: . 


2 


13.136 


3 


. 11.241 


4 


9.846 


5 


' 10.128 


. 6 


8.655 


7 


6.441 


'8 


5.902 


9 


6.807 


10 


7.343 


11 


^•8|A 


12 



/ 



5.968 

2.084 

r.60O 

1.460 

1.639 

1-778 

1.641 

1.139 

1.969 

1.688 

1.641 

1.207 



11.844 
10.282 
11.536 
9.781 
8.206 
8.350 
7.014 
5.302 
3.933 
5.119 



23.779 
,.-04,460 
, 14.736 
l?.70l 
11.485 
11.906 
, 10.296 
7.579 
7.87ia 
8.495 
8.884 
6.017 . 



. Eyamplt II 
> SUtt Summary 

* Communication Programs (Titit I. Statt 



Mean 

Gain (NpE) 








Lower Control Upper Control 


Grad« 


1.96 X $5 


Limit (NCE) 


Limit (NCE) , 


1 


3.454 


10.231 


. 17.139 1 


2 


1.526 


9.632 


12.684 - 


3 


3.829 


6.555 


4 14.213 


4 


1.135 


6.21 1 


d.4ai . 




1.370 


5.623 


8.3^3 




' 1.6U 


2.3dQ- 


.5.^12-^ 




1.015 


5.966 


7.996 




1.015 ' 


Z.938 . 


4.968 




1..U5 


4.444. 


6.374* 


• 10 

t 


: • 1.677 


3.716 




■ 11 


1.267 


2'.a06 




' 12 ; 


0.9^ 


3*321 * 


. 5.3i9 



It must be stressed tMt any giin In NCEs IndlcMes program effectiveness. 
Thos* gains above the "high" levels specified, however, may be Interpreted a^ 
significantly greatv than the averags of t'Hose compensatory education programs 
that report their results correctly. 

* 

This preliminary data set show interesting results; " ' < 

1. Differences In performance between achievement in 

computation and communication . ' r ; 



■V, 



2. Higher NCE gains in the lower elementary grades 

3. Wide tolerai^control limits in the lower elenr^eptaiV ' 
grades (morrvariabillty In performance) 

' • . ~" . . \ \'' 

4. Smaller tolerance/control limits in tVie higher grades 

(less variability in performance) ♦ . ' ' *^ ;. 

• •'. \' ' > ' ■ 

Theultlmatepurposeof the data set,however, is to demortstrhte quality J' ^ 
control concepts with an actual data set. The next step In thl^ s^ate analysis • 
would be 1) to array all of the data from individual districts by grade, ft) tp 
Superimpose the upper control limit and ihe lower control limit and 3fto 
identify the outlyer data points for further analysis. 

1- ' ' ■ . - •* ' ' , ' 

Conclusions 

, V / • ■ 

This paper has presented an industrlal.model of statistical quality control for application 
at the-state level on data generated by the. Title I Evaluation and Reporting System (TIERS). 
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